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Combining models in appropriate ways to achieve 
high performance is commonly seen in machine learn- 
ing fields today. Although a large amount of com- 
binatorial models have been created, little attention 
is drawn to the commons in different models and 
their connections. A general modelling technique is 
thus worth studying to understand model combina- 
tion deeply and shed light on creating new models. 
Prediction markets show a promise of becoming such 
a generic, flexible combinatorial model. By review- 
ing on several popular combinatorial models and pre- 
diction market models, this paper aims to show how 
the market models can generalise different combinato- 
rial stuctures and how they implement these popular 
combinatorial models in specific conditions. Besides, 
we will see among different market models, Storkey's 
Machine Learning Markets provide more fundamen- 
tal, generic modelling mechanisms than the others, 
and it has a significant appeal for both theoretical 
study and application. 
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1 Introduction 

The models that arc built from combining some individual 
models are popular in machine learning. In a combina- 
torial model, the individual models are components and 
a structure is given to combine the components appropri- 
ately, such as taking the average of all individuals or out- 
putting the majority of the results. The structure is thus 
called the combinatorial structure. The first impression of 
the combinatorial structures can be given by two popular 
models, ensemble learning^^ and graphical models['i\. 

Ensemble learning In machine learning, no individual 
model can always perform the best in all cases: one model 
is just suitable for a certain type of dataset or only part 
of the given dataset. However, we can always make good 
predictions by using multiple models instead of individual 
ones. This method is called ensemble learning. For exam- 
ple, for any dataset we compare and choose the algorithm 
that is the most suitable for this dataset, and then make 
predictions based on its result [ini HH]- By doing this our 
prediction will never be worse than the best algorithm on 
any dataset. Another example is the Netfiix challenge's]. 
The basic idea shown of ensemble learning is treating dif- 
ferent algorithms as components and combining them in an 
appropriate way. An ensemble learning model can achieve 
high performance on the dataset without any prior knowl- 
edge of it, as long as its has an appropriate way of combin- 
ing its components. In other words, with a good combina- 
torial structure an ensemble learning model can perform 
very well. 

Graphical models In a graph the nodes or cliques can 
be treated as components and the edges represents how 
the components are combined, therefore it's natural to 
think a graph as a combinatorial model. A component in 
a graphical model always gives a marginal belief on few 
random variables, which on many occasions differs from 
the components in ensemble learning models that always 
give their beliefs on all variables. 

Generalising the combinatorial structures has a long- 
term appeal for both theoretical study and application. 
It can help understand combinatorial models deeply and 
inspire us to build new models with novel structures. 



In the latest years, prediction markets show a promise 
for becoming such a generic combinatorial model[Tl [^ . 
Prediction markets are born to be the information aggrega- 
tors and they naturally combine the agents' beliefs through 
market mechanisms [Sj. The goods in the market are the 
contracts associated with certain outcomes of the future 
event [23]. Each agent bets for the outcomes based on its 
own belief, by buying or selling some amount of the corre- 
sponding goods. When the market reaches equilibrium, the 
prices of goods are the aggregated beliefs for the outcomes 
associated with the goods. 

A prediction market also has a combinatorial structure, 
and its agents are the components. Representing the 
structures in prediction markets have much flexibility, 
because there are quite many choices on the type of 
contracts, market mechanisms, and many ways of repre- 
senting agent behaviours. Therefore different views on the 
market structures will result in different prediction market 
models, such as Storkey's Machine Learning Markets, 
Lay's Artificial Prediction Markets, Chen's Market Maker 
Prediction Markets, etc. 

This paper aims to show how the market mechanisms 
can generalise the modelling and implement these mod- 
els in specific conditions. Besides, we will compare differ- 
ent market models. The criterion the paper uses to dis- 
cuss modelling and compare different models is how fun- 
damental and generic a model can be. Based on it, the 
paper thinks Storkey's Machine Learning Markets model 
provides a better modelling mechanism than the others. 

Our discussion is based on the key papers [101 HO] and 
[2J [TU [TS] . The former two papers introduce Storkey's Ma- 
chine Learning Markets and the latter three Lay's Artificial 
Prediction Markets. Both of them build a generic model 
successfully, but Machine Learning Markets model on more 
fundamental assumptions than the other. They also dis- 
cuss the learning process briefly. Other papers such as [Sl[7] 
and |TT] introduce special agents, the market makers, and 
market scoring rules to represent the market mechanisms 
and to discuss the learning process. Although introducing 
market maker can help obtain some important results, it 
makes the prediction market less general than the former 
models, which do not introduce any special agents. 

To show how generalisation is achieved, the paper will 
first review on few popular and typical models and the 
structures they hold before intruding prediction market 
models. 

Chapter [2] introduces the popular combinatorial models 
and makes a summary at the end. Chapter [3] prepares 
for the following discussion by introducing some basic con- 
cepts in prediction markets. Chapter [4] discusses modelling 
and compares three market models. Chapter [5] talks about 
learning briefly. Finally, Chapter [6] draws conclusion. 

2 Models that have combinatorial 
structures 

Although only few models will be mentioned here, their 
structures are quite typical, such as weighted average of 



the beliefs, product of the beliefs. 

2.1 Boosting 

Boosting is a class of algorithms [H]. The idea is run the 
weak learner (who has many weak classifiers) on reweighed 
training data, then let learned classifiers vote. Boosting is 
an ensemble learning example. There are many implemen- 
tations of boosting, some famous ones are AdaBoost^ and 
Random Forrest'^. 

AdaBoost The model of AdaBoost is simply weighted 
average, 

/(x)=^a,/i,(x) (1) 

i 

Where is the weight for the weak classifier hi. Some- 
times, hi is also called basis, hypothesis or "feature" , These 
names just reflect the different angles of views on Ad- 
aBoost. For this algorithm, learning seems more impor- 
tant than modelling. If it wants to achieve good per- 
formances, it should have good choices on the weights. 
Here PAC (Probably Approximately Correct) learning 
theory [21] supports AdaBoost to choose weights appropri- 
ately. Despite the complicated learning process, the struc- 
ture of AdaBoost is quite simple. 

Random Forrest The main difference between Ran- 
dom Forrest and other boosting algorithms is it introduces 
stochastic properties to the model. Because of this Ran- 
dom Forrest is not even treated as a boosting method. We 
won't dwell on the terminology issue. It's more important 
to see the connections between them. The structure of 
Random Forrest is even simpler, where the weights are all 
units, 

/(x)=^/i,(x) (2) 

i 

However, although its structure seems less flexible without 
weights. Random Forrest also achieves good performances. 
The reason is this algorithm puts more efforts on choos- 
ing bases. In fact, each basis ft.j;(x) is a tree grown on 
the training data and controlled by a random vector O^, 
hi{x) = hi{x, Qi). Similarly, Random Forrest has a simple 
structure but a complex learning process. 

2.2 Mixture model 

Suppose there is a set of experts. Each expert performs 
well only on part of the whole data domain. If a model 
can give the outputs based on the most suitable experts for 
the data, the model can always achieve high performances. 
This kind of model is called mixture model. One example 
of mixture models is mixture of experts [13] . 

mixture of experts To construct the mixture structure, 
this model assigns different weights to each expert accord- 
ing to the data points. If the experts are good to explain 
the data point, their weights will be larger than others. 
Therefore the structure is, 

/(x)-^z/;,(x)/i,(x) (3) 




Figure 1: The graphical model for Product of Hidden 
Markov Models 

Here the weights can be treated as the posterior knowl- 
edge given the observations. One special case in mixture 
of experts is mixture of Gaussians, where each expert is a 
Gaussian distribution and the corresponding weight Wi (x) 
is its responsibility for the observed data. It's worth noting 
that one graphical model, the hidden space model[A , can 
represent this structure. 

2.3 Product of experts 

This model has a combinatorial structure in a product 
form|12j. The product is always associated with graphs. 
The components are thus represented by nodes or cliques. 
The components give their own beliefs. When they are 
combined by different types of edges (directed or undi- 
rected), the final output are always in the form of the prod- 
uct of these beliefs. One example is product of HMMs[T3]. 

product of HMMs The graphical model for product of 
HMMs is a mixture. It contains both directed edges and 
undirected edges (Figure [T]) . The probabilities of variables 
in this case is complex. However, in graphs the joint prob- 
ability distribution has a general ■product form. Denote x 
as the vector which contains several random variables. We 
have, 

^(x) = ^ n 'i'cixc) (4) 
cec 

Here C is the set of cliques in the graph, C G C is a cer- 
tain clique and xc is the variables contained in the clique. 
The final output of the model are always derived from the 
joint distribution, by either doing marginalization or intro- 
ducing evidence. For example, using Q we can write the 
probability of observed variable V in this model[13j, which 
we will omit here. 

2.4 Summary 

So far this paper has introduced several models and the 
combinatorial structures they represent. There are some 
common features in these structures. It's helpful to sum- 
marise these features before we discuss the generalisation 
of the structures. 

In combinatorial structures, a component is an agent 
which has its own belief on the data. Different components 
(agents) act differently, by mapping the data to different 
probability distributions. To show this idea more clearly, 



we consider a sample space SI and its cr-field i?, and sup- 
pose the data are drawn from this space (5^,5"). What the 
agents actually do is they give their own probability mea- 
sures on (r2, Thus agent i has its own probability space 

Combining these agents is actually a process of finding 
an appropriate map J- : {Pi} ^-^ P, where P is still a prob- 
ability measure on ($7,5^). We've already seen some exam- 
ples of J^. In boosting it is the weighted average. Now 
because of the much flexible definition, J- should not be 
restricted to few forms such as weighted average or prod- 
uct, although only these structures have been seen in this 
paper so far. 

Sometimes it's necessary to consider the situation that 
some agents may have their beliefs defined on the subspace 
of (ri,?"). For example, in product models, the behefs of 
agents are only defined on the cliques; they are also called 
local beliefs or marginal beliefs. There are two ways of 
interpreting these beliefs. The first one is introducing the 
subspace (Sli,3^j). Now the probability space of agent i 
is Pi), and thus Pi{fli) — 1. The second one is 

"slicing" the probability measure using Dirac S, so that 
the probability of an event can be positive only if it is 
in (riij^i). However, compared with the first way, the 
subspace is not explicitly defined. 

Suppose the data is drawn from the space {Qi,^i, P'^), 
where P^ is the true probability distribution of the data, 
the learning process aims to give P which is as close to P^ 
as possible. In combinatorial structure, P is constructed 
by the beliefs of agents {Pi}- Therefore we can think {Pi} 
as bases and they form a hypothesis space. That's why 
in AdaBoost a component is also called basis or hypothe- 
sis. Besides, it helps understand modelling and learning in 
combinatorial structures. 

3 Basic Concepts in Prediction Markets 

To fill the gap between the economics and machine learn- 
ing, the paper will first introduce some basic concepts in 
Prediction Markets. 

3.1 Definitions 

Definition 1. A market is a mechanism for the exchange 
of goods. The market itself is neutral with respect to the 
goods or the trades. The market itself cannot acquire or 
owe goods. Unless explicitly stated otherwise, perfect liq- 
uidity is assumed and there is no transaction fee. All par- 
ticipants in the market use the same currency for the pur- 
poses of trade. 

Definition 2. The market equilibrium is a price point for 
which all agents acting in the market are satisfied with their 
trades and do not want to trade any further. 

Definition 3. Prediction market ( also predictive market, 
information market) is a speculative market created for the 
purpose of making predictions. Current prices in market 
equilibrium can be interpreted as predictions of the proba- 
bility of an event or the expected value of the parameter. 
One good is used as currency and each of the remaining 



goods is a bet on a paticular outcome of a future occur- 
rence. A bet pays off if and only if the paticular outcome 
accoiciated with this bet actually occurs. 

Types of Prediction Market We can design different 
types of bets (contracts) to realise different types of predi- 
tions in tlie prediction markets (Table [T]). For the purpose 
of predicting based on probabilities, we would like to choose 
the winner-take- all type. 

Definition 4. Machine learning market ( or artificial pre- 
diction market) is a special type of prediction market. Mar- 
ket participants are classifiers, which are also called agents. 
The market uses the winner-take- all contract. In the mar- 
ket the agents bet for the outcomes of the future events. 
They buy and sell bets based on their own beliefs. Prices 
in market equilibrium will estimate probabilities over the 
outcomes. 

Definition 5. Utility is a measure of satisfaction, refer- 
ring to the total satisfaction received by a consumer from 
consuming a good or service. A utility function is defined 
on the current wealth, and maps the satisfaction to a set of 
ordinal numbers, for which the common choice is M. Any 
function is appropriate as long as it keeps the same ordi- 
nals. 

Definition 6. A buying function (also betting function) 
represents how much an agent is willing to pay for a good 
(or how many contracts it would like to buy) depending on 
its price. 

3.2 Notations 

For the consistency this paper follows the notations in [20] 
and [TUl. 

• An outcome is an event defined on the sample space 
with a proper cr-field (il,3^). This space is mapped 
to a random vector x and each out come is denoted 
by y. In prediction markets each outcome is a good. 
Goods are enumerated by k — 1,2,..., No- In our 
discussion, {yfc} are mutually exclusive. For the sake 
of simplicity, k can also denote the good {yk}- If there 
is only one random variable, the vector x will reduce 
to variable, and Nq is the number of its outcomes. If 
there are N variables in x, the number of outcomes 
will increase exponentially on N . 

• The price of good k is denoted by Cfe. c = 
(ci, C2, . . . , cmq)^ denotes the price vector, = 1 
maths its probability meaning. 

• The agents are enumerated by i — 1,2,..., Na- The 
wealth of agent i is denoted by Wi. The beliefs of 
agents on x are denoted by Pi(x). So the belief of 
agent i on good k is Pi{k), Pi{k) = 1. 

• Stockholding of agent i in good k is denoted by 
Sik- Negative stockholding indicates the agent sells 
the good, so Sik can be a negative value, = 
(sii, Si2, . . . , SiNc)'^ denotes the vector of stockhold- 
ing of agent i. 



• The utility of agent i is denoted by Ui. The buying 
function of agent i is denoted by Sik(Wi,c). 

• If X contains multiple variables, they are enumerated 
by j = 1, 2, . . . , J. As we discussed in Chapter [2] we 
can introduce subspaces or cliques to make the beliefs 
defined only on only part of the variables in x. The 
subspace of agent i is denoted by Si, and the variables 
in S is denoted by x'^', x'^' = {xj | Xj £ Si}. The 
agent's behef is denoted by Piiy^'^). {y^'} are the 
outcomes of x"^' . 

• The training points are enumerated hy t = 1,2, ... ,T. 
The t-th point is denoted by I?'. 

4 Modelling with Prediction Markets 

The basic idea of modelling is that markets interpret the 
agents' behaviours in an appropriate way, and describing 
how they interact with each other. Then the connection 
between market prices and beliefs reveal the relationship 
between markets and other model structures. 

4.1 General market mechanism 

The prices of goods are determined by the equilibrium sta- 
tus of the market. Different agents interact with each other 
in the way that they buy (sell) their preferred amount of 
goods to (from) others. Their behaviours are interpreted 
by the buying functions. When the market reaches the 
equilibrium, the supply matches the demand, thus no one 
would like to trade any more goods. 

Na 

^s,(I^„c) = (5) 

Note that Si{Wi,c), so there are Nq equations for Nq 
goods. Substitute all buying functions into ([5| and we 
can the solve the prices for Nq goods. The prices are the 
aggregated probability distribution on the future events. 

In many situations calculating the market equilibrium 
using ([5]) is difficult. For the simplicity of numerical cal- 
culation, |TD] gives a score function which is called market 
equilibrium function. 

E{c)=Y,(^s,,{W„c)^ (6) 

Here E{c) > 0, and the equality holds if and only if ^ 
holds. Therefore minimising E(c) will give the market 
equilibrium. 

Therefore, market equilibrium is the general market 
mechanism that aggregates the individual beliefs in the 
market. Si(Wi,c) has not been specified here. Implement 
the buying function in different ways will give different 
models. 

4.2 Model buying function — Artificial Prediction 
Markets 

One way to implement the buying function is we model 
the buying function directly [H [TH [13]. These papers call 



Contract Details (with the example that Liberty Party win Reveals market expecta- 

the vote) tion of . . . 



Winner-take- all Costs £p. Pays £1 \i and only if the Party wins Probability that event y 

y%. Bid according to value of £p. occurs, p{y) 

Index Pays £1 for every percentage point won by the Mean value of outcome 

Party. y: E[y] 

Spead Costs £1. Pays £2 if the percentage y > y* . Median value of y 

Pays £0 otherwise. Bid according to the value 
of y* . 

Table 1: Different contract types in prediction markets 



buying function "betting function" , and they think we can 
choose its form arbitrarily. These papers define the buying 
function in a factorial form, 



Sifc(Wi,c) = Wi0,(fc,c) 



(7) 



Where (j)i{k,c) means the proportion of wealth the agent 
would like to use. These papers give three types of the 
proportion functions. 



• Constant proportion functions 

• Linear proportion functions 

Uk,c) = (l-Cfc)P,(/c) 

• Aggressive proportion functions 

iick<P^{k) 
if Ck > P^{k) 
otherwise 



(8) 



(9) 



(/),(fc,c) = <^ 



(10) 



The advantage of modelling the buying functions directly 
is, we can suppose some functions with simple forms and 
can obtain the results analytically. For example, use con- 
stant proportion function and ([s]), we have 

J:^W^P^ik) 



Ck 



1,2, 



(11) 



This is exactly the weighted average of all beliefs, and it 
interprets the structure of AdaBoost in ([T]). Especially, if 
the weights are all the same, it interprets Random Forest 
in ([2|. 

4.3 Model utility Machine Learning Markets 

The drawback of this formulation is, however, the buy func- 
tion doesn't always have a factorial form. Besides, from 
the economics point of view, the buying function is not the 
foundation to interpret the agent's behaviour j22). Instead, 
utility function is the one. According to j22) . buying func- 
tion is derived from the agent's rational behaviour, that 
the agent always wants to maximise its utility subject to 
its budget constraint. Therefore the utility function is de- 
fined on the wealth. In prediction markets, the utility has 



uncertainty. The expected utility function (which is called 
Von Neumann-Morgenstern utility) can be written as|20j. 



Ng 



k=l 

i = l,2, 



(12) 



This formulation introduces a free degree: agents can 
change their stockholdings by making risk free trades, 
namely changing the holding from to -I- al, and these 
trades don't affect their utilities (these utilities keep the 
same ordinals). We can introduce the gauge or standardi- 
sation constraint to eliminate this free degree. One choice 
of the gauge is, 

(13) 



T 

s,- c 



Then ( 12 ) is rewritten as. 



No 



nU^]=J2P^{k)U,{W,+S,k) 



fc=l 

s.t. sfc 



(14) 



0, 



1,2,...,Na 



It's worth noting that, (12) doesn't guarantee the invari- 



ances under translation, because 

U,{W, - {s,+alfc + s,k) 



s, c 



Sik) 



(15) 



The invariances can hold only when Ui{x) — Ui{x + t), 
which may not always be met. In 6] and [7] the author 
constructs a cost function that can always hold the trans- 
lational invariances. 

Maximising the utility function gives the buying func- 
tion. For agent i, the buying function is, 

Si(Wi, c) = argmaxE[J7i] s.t. sf c = (16) 

Si 

Taking derivatives w.r.t each Sik to get the maximum. Use 
Lagrange multiplier to include the gauge and we have. 



P,{k)U^{W, + s,k)-Kck=0 



(17) 



Solve the Sik = Sik{Wi,c, Ai) from the above and combine 
it with the gauge ( [l3| , we can finally solve the buying 
function Sik{Wi,c). In [10] the author gives three types 
of utility functions and derives the corresponding buying 
functions. 



Logarithmic 



Uiogix) 



log(x) if x > 
— oo otherwise 

Ck 

This buying function has a hnear form. 
• Exponential 

Uexp{x) = -exp(-a;) 
Sih(Wi,c) = log P,{k) - logcfc 



(18) 



(19) 



Because in exponential utility function we have 
_g-w-x _ _e^^e~^, so when taking derivatives 
the wealth term in the utility function is eliminated. 
Therefore the buying function do not depend on the 
wealth. 



Isoelastic (77 > 0) 



Uiso (^) 



1 — r/ 



(20) 



1/,, 



Note that when 77 1, it becomes the logarithmic 
case. 

4.4 Implementing combinatorial structures 

In former section, we have seen how prediction markets 
generally work to aggregate information and interpret it in 
prices of goods. Aggregation is determined by market equi- 
librium, which is based agents' behaviours described by 
their utilities. There is no restriction on choosing agents' 
utilities, so all agents can either have the same utility 
function, or their unique ones. The market whose agents 
share the same utilities is called homogeneous market, oth- 
erwise it's called inhomogeneous market. An inhomoge- 
neous market is more general since it can become a ho- 
mogeneous one by assigning the same utility to all agents. 
However, homogeneous market can implement these popu- 
lar combinatorial structures we mentioned before and can 
even bring completely new structures, let alone the inho- 
mogeneous market that may give much more outcomes. 
The prediction market models, Storkey's Machine Learn- 
ing Markets \W[ 120] and Lay's Artificial Prediction Mar- 
kets [H [TU [TS] , both use homogeneous markets to form the 
combinatorial structures. 

Homogeneous market with logarithmic utilities 

Using (18) and market equilibrium condition ([5]), we have. 



Ck 



Y.iWiPi{k) 



(21) 



It's the same with (11) which indicates that Machine 



Learning Markets can give the same results that Artificial 
Prediction Markets can give. 



Homogeneous market with exponential utilities 

Using (19) and market equilibrium condition ([5|, we have. 



Na 

CkO.WP^ikY"'- 

i=l 



(22) 



It interprets the structure of Product of HMMs in ([4| . Here 
every clique C on which the beliefs are defined is a markov 
chain (Figure [T]). 



Homogeneous market with isoelastic utilities Us- 
ing (20) and market equilibrium condition ([5]), we have, 



Na 

1=1 



Ijr, 



Ng 

j = l -J 



mi) 



l/r, 



(23) 



This is not a closed form because the right side contains 
prices. Besides, it's a novel combinatorial structure, and 
all the models we ever discussed don't have such kind of 
structure. Despite the lack of model examples, we can infer 



some properties of this structure according to ( 23 ) 



• Similar to logarithmic case, the agents that have large 
weights (or that are more wealthy) Wi contribute more 
to the market pricing. They would like to make the 
market price close to their beliefs. In economics, these 
agents are described to act like the "price makers" [3 

Bin]. 

• The personal belief is not so important as in logarith- 
mic case. In stead, it is the relative belief Pi{k)/ck 
that really affects the prices. If an agent is wealthy 
but his belief has a large deviation from others, the 
market will still treat it as an "outlier" and reduce its 
contribution. 



4.5 Agents with beliefs on subspaces 

We have discussed the agents that always have their beliefs 
on the whole random vector x. This situation is true if x is 
actually a random variable, which means x contains only 
one entry. However, if there are multiple entries in x, it will 
be more general to see the agents have beliefs only on part 
of X. Introducting the agents with marginal beliefs make 
a model more general since the agents are not required to 
have their knowledge on all random variables now. As we 
mentioned in Chapter [2] there are two ways to define the 
belief on the subspace. 

The paper[2^ introduces the subspace explicitly and de- 
fines the belief on it. Suppose x contains J variables, 
X — (xi,X2, . . . ,xj)"^, and agent i has its belief only on 
those variables in its subspace Si. Then the belief is writ- 
ten as Pi{y^^). {y"^'} are the outcomes of the subspace 
variables x'^'. Because Pi{y^'^) is a marginal probability 
distribution, f20^ calls the agents who have their beliefs on 
subspaces the marginal agents. 

The expected utility function for marginal agents is writ- 



ten as, 



(24) 



Introducing a special agent has its own drawback: it 
makes the market less general because in many cases there 
should be no any special agents in a market. Compared 
with Market Maker Prediction Market, the Machine Learn- 
ing Markets model doesn't introduce any special agents 
and thus is better to be a generic model. 



Where c(y^O - Ey'\y-._ _ . . 

whole space, then the above equation is back to ( 12 ). There 5 Learning Process 



c(y). If Si is exactly the 



is no simple representation for the prices with marginal 
agents, but we know the market will give prices based on 
these marginal beliefs. 

4.6 Prediction markets and the map T : {Pi} P 

Flexible choices of utilities (buying functions) and beliefs 
make market models general. Again we can understand 
market models using the map. What the combinatorial 
structure provides is a map that make the agents' beliefs 
{Pi} map to the aggregated belief P. The prediction mar- 
ket can implement a large number of combinatorial struc- 
tures by choosing different utilities or buying functions. 
In former sections we have seen how the market results 
in those popular structures, such as homogeneous market 
with logarithmic/exponential utilities, and the structure 
that has not ever been used in any combinatorial models, 
such as homogeneous market with isoelastic utilities. The 
flexibility the prediction market has when it implements 
the map : {Pi} ^ P shows that the prediction market 
is likely to be a generic combinatorial model. 

One interesting question is how many maps on earth 
can be interpreted by the market. So far no work has been 
done, but solving this question will help refine this theory 
and thus it's worth further study. 



The problems of learning discussed here not only refer to 
the training, but also refer to the evaluation. Given one 
model, people are most curious about how good it can 
actually perform. For prediction markets both problems 
have not been completely solved yet. However, current 
results do show prediction market models have good per- 
formances, at lease from the Bayesian view. 

5.1 Training market models 

In |TD] the author discusses two ways of training the ho- 
mogeneous market with logarithmic utilities, whose struc- 
ture is represented by (21). Because agents' beliefs don't 



change, only their wealths {Wi} keep updating during the 
training. Therefore the training is also called wealth up- 
dating in this paper. The two ways of wealth updating 
are online update and hatch update. Online update means 
{Wi} update after every training sample. Batch update 
means that each agents wealth is divided into equal pieces, 
one piece for each training point, and the updated wealth 
is the sum of all the updated pieces. Before training, we 
need choose an appropriate initial wealths. For example, 
we can choose uniform wealths {1/Na}- Denote them as 
{Wf}- 



4.7 Another market mechanism Market Maker 
Prediction Markets 

Besides Storkey's Machine Learning Markets and Lay's 
Artificial Prediction Markets, another work in this field 
is done by Chen et al.0 [J- They introduce a market 
maker to help analyse good pricing and the bounded loss of 
learning. In [7] , the authors prove the equivalence of three 
different market marker mechanisms: market scoring rule 
(MSR) market maker, utility-based market maker and cost 
function based market maker. It's worth noting that, the 
utility-based market maker is pretty similar to the machine 
learning markets, where the market maker's behaviour is 
based on its expected utility function. However, instead 
of maximising the utility function, the agents in the mar- 
kets aim to maximise their expected wealth subject to the 
invariance of the market maker's utility. Suppose there is 
only two agents, one is a trader (denoted by t)and another 
is the market maker (denoted by m). Then the market 
mechanism is written as, 

maxV Pt(A:)(-mfc) s.t. Y Pm{k)Um{mk) ^ C {2b) 

k k 

Where rrifc is the money the market maker spend on the 
good k. The money the trader spend is —ruk for the as- 
sumption that the total wealth of two agents is 0. C is a 
constant. 



online update 



W, 



t+i 



= Wi 



sfc + s^kt + Sikt 

, , w!{p,ik')-ckt) 



Cfct 



(26) 



WlP.ik') 



Where fc* means the true outcome of training point D*, 
and because of the standardisation constraint, s^c = 0, 
we have — sfc + Sik* — Wf + Sik*. Train the market 
with T data points, we will get the final wealth for each 
agent Wi = Wf. 



batch update 



T 



wl 

T 



Wf {P,{k') - Ckt) 
T Cfet 



wl^p^ 



(27) 



t=i 



C/c* 



5.2 Bayesian view on training 

Now we think the agent's personal belief in another way. 
Suppose agents are represented by a random variables 
z, which has outcomes i = . . . ,N a- The joint 



portability distribution over the random vector (x-^,z)-^ 
is denoted by P'(x,z). Then each agent's behef Pi{k) is 
the conditional probability P'(x = k\z — i), namely we 
have Pi{k) = P'(x = k\z = i). For simplicity we write 
P'{k\i) = P'(x = k\z = i). In fact, we treat agents as the 
outcomes of a latent variable z. If we marginalise it on z, 
we will obtain P'{k). 

The market prices form another probability distribution 
on X, Cfc = P{k)- Note that P' and P are two different 
probability measures on the sample space. P' is directly 
obtained by the sum rule, while P represents an arbitrary 
aggregation process. Only in one special case we have 
P' — P: the aggregation process is the (weighted) sum 
over all agents, which is the same as marginalization. The 
homogeneous market with logarithmic utilities just provide 
such a process. When P' — P we have, 

P(fc)=P'(fc) = ^P'(fc,z) 

(28) 

= 5]P'(*)P'(fcW=5]P'(z)P.(fc) 

i i 

Where P/ can be treated as the weight of agent i,Wi = P/ 
(suppose Wi has been normalised). This is exactly the 
Bayesian learning in j^j. Now we consider the learning 
process and introduce training data. 

P{k\D) - P'mD)P,ik\D) (29) 

i 

We see that Wi = P'{i\D). So Wi is the responsibil- 
ity of agent i for data D, or the posterior probability of 
agent i given data. In Bayesian learning, as the number of 
data points increases, the posterior distribution P'(z|_D) 
becomes sharp. After enough training points, the agent 
whose belief is the most close to the true probability dis- 
tribution will gain the largest weight, while all the others 
have nearly zero weights. Therefore, in the homogeneous 
market with logarithmic utilities, finally the prices will be 
made by the best agent. 

In fact we can understand this Bayesian process as a 
Bayesian model averaaina\l^. It is often used to select the 
best algorithm for a given dataset, among a number of algo- 
rithms, but it can select only one algorithm [17]. Strict de- 
duction shows that, with infinite number of training points 
T — >■ oo, if the hypothesis space contains the true proba- 
bility distribution, Bayesian model averaging can select it 
out; if the hypothesis space doesn't contain the true one, 
then it will select the distribution that is the most close to 
true one. 

Therefore, with enough points, the homogeneous market 
with logarithmic utilities will perform the same good as the 
best agent in the market. They have the same loss, which 
means how well the market can perform will depend on 
how good the best agent is. 

5.3 Further discussion 

Bayesian learning seems have solved the learning problem 
elegantly, but in fact it's too ideal. The so-called "with 
enough points" is a condition that can never be met, be- 
cause we need T — > cxd training points. It will be much 



more significant to answer how good performance the mar- 
ket can achieve with finite or Jew training points. In fact, 
this question is where some learning theories originate, 
such as Vapnik-Chervonenkis theory. 

For some models, such as boosting, this question has 
been well solved. So if the market mechanism results in 
boosting algorithms, we can know the performance of the 
market. However, we cannot know its performance if the 
market gives other combinatorial structures. What we 
want to know is whether it's possible to have a general 
idea on the performance of market, regardless of the com- 
binatorial structure it represents. 

Some work has done by Chen et al. and their discus- 
sion is based on introducing the market maker. In [6i they 
show that "any cost function based prediction market with 
bounded loss can be interpreted as a no-regret learning al- 
gorithm" . Besides, in [7| they have shown the equivalence 
between cost function based prediction market and util- 
ity based prediction market under some conditions. As 
we mentioned before, it's possible to connect the market 
marker model and Machine Learning Markets model. For 
example, we can specify an agent in the machine learn- 
ing markets as a market maker. If it's true, the results of 
the market maker model can be used on Machine Learning 
Markets, making this model better and more general. 

6 Conclusion 

Prediction markets have shown a huge potential of becom- 
ing a generic combinatorial models. In this review, we 
see how prediction markets provide flexible mechanisms 
to implement different combinatorial structures. By as- 
signing specific utilities or buying functions to each agent 
and using market equilibrium condition. Artificial Predic- 
tion Markets and Machine Learning Markets can give not 
only the structures of some well-known models (logarith- 
mic utilities implement averaging structure, and exponen- 
tial utilities implement product structure), but also ones 
that have never been used (isoelastic utilities). More gen- 
erally, prediction markets provide many optional choices of 
the map : {Pi} '—?' P where each T stands for a combina- 
torial structure. Therefore the prediction market models 
are general. 

This paper has first compared two prediction market 
models. Artificial Prediction Markets, which models the 
buying functions, and Machine Learning Markets, which 
models the utilities. Both of them perform well in gener- 
alising combinatorial models. However, because utility is 
more conceptually fundamental than buying function and 
it can give buying function naturally. Machine Learning 
Markets model is more general and more elegant, than 
Artificial Prediction Market. Besides, the paper also dis- 
cusses another market model, called Market Maker Pre- 
diction Markets. This model introduce a special agent, 
the market maker, to help analyse the market mechanism 
and pricing (namely learning) process. Admittedly some 
significant results are obtained. However, introducing the 
special agent makes it less generic than the former two 
models, which is a big problem. Therefore, based on the 
criterion that a better model should be more fundamental 



and more general, the paper holds the conclusion that Ma- 
chine Learning Markets is the better one. A possible work 
might be trying to find the connections between Machine 
Learning Markets and Market Maker Prediction Markets, 
in order to use these significant results to refine Machine 
Learning Markets modeL 

A brief disciission on the learning process from Bayesian 
view shows that market models (Machine Learning Mar- 
kets) do have high performances and thus are attractive to 
practitioners. 

Apart from the modelling and learning theories, Ma- 
chine Learning Markets model shows a promise in practical 
implementations. In the market all agents have the same 
priority, and their behaviours are independent with each 
other, only based on their own utilities. Parallelisation 
is therefore able to be used in the model, which shows a 
long-term appeal when people need to deal with more and 
more data. 

In sum, prediction market models are becoming the 
generic combinatorial models. Among different mar- 
ket models, Machine Learning Markets has a particular 
promise of becoming the best market model, for it builds 
the model more general and elegantly, and gives attractive 
applications in the future. 
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